Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
1.
medRxiv ; 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38559045

ABSTRACT

Importance: Diagnostic errors are common and cause significant morbidity. Large language models (LLMs) have shown promise in their performance on both multiple-choice and open-ended medical reasoning examinations, but it remains unknown whether the use of such tools improves diagnostic reasoning. Objective: To assess the impact of the GPT-4 LLM on physicians' diagnostic reasoning compared to conventional resources. Design: Multi-center, randomized clinical vignette study. Setting: The study was conducted using remote video conferencing with physicians across the country and in-person participation across multiple academic medical institutions. Participants: Resident and attending physicians with training in family medicine, internal medicine, or emergency medicine. Interventions: Participants were randomized to access GPT-4 in addition to conventional diagnostic resources or to just conventional resources. They were allocated 60 minutes to review up to six clinical vignettes adapted from established diagnostic reasoning exams. Main Outcomes and Measures: The primary outcome was diagnostic performance based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps. Secondary outcomes included time spent per case and final diagnosis. Results: 50 physicians (26 attendings, 24 residents) participated, with an average of 5.2 cases completed per participant. The median diagnostic reasoning score per case was 76.3 percent (IQR 65.8 to 86.8) for the GPT-4 group and 73.7 percent (IQR 63.2 to 84.2) for the conventional resources group, with an adjusted difference of 1.6 percentage points (95% CI -4.4 to 7.6; p=0.60). The median time spent on cases for the GPT-4 group was 519 seconds (IQR 371 to 668 seconds), compared to 565 seconds (IQR 456 to 788 seconds) for the conventional resources group, with a time difference of -82 seconds (95% CI -195 to 31; p=0.20). GPT-4 alone scored 15.5 percentage points (95% CI 1.5 to 29, p=0.03) higher than the conventional resources group. Conclusions and Relevance: In a clinical vignette-based study, the availability of GPT-4 to physicians as a diagnostic aid did not significantly improve clinical reasoning compared to conventional resources, although it may improve components of clinical reasoning such as efficiency. GPT-4 alone demonstrated higher performance than both physician groups, suggesting opportunities for further improvement in physician-AI collaboration in clinical practice.

3.
Nat Med ; 30(4): 1134-1142, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38413730

ABSTRACT

Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.


Subject(s)
Documentation , Semantics , Humans , Electronic Health Records , Natural Language Processing , Physician-Patient Relations
4.
BMC Med Educ ; 24(1): 185, 2024 Feb 23.
Article in English | MEDLINE | ID: mdl-38395858

ABSTRACT

BACKGROUND: The increasing linguistic and cultural diversity in the United States underscores the necessity of enhancing healthcare professionals' cross-cultural communication skills. This study focuses on incorporating interpreter and limited-English proficiency (LEP) patient training into the medical and physician assistant student curriculum. This aims to improve equitable care provision, addressing the vulnerability of LEP patients to healthcare disparities, including errors and reduced access. Though training is recognized as crucial, opportunities in medical curricula remain limited. METHODS: To bridge this gap, a novel initiative was introduced in a medical school, involving second-year students in clinical sessions with actual LEP patients and interpreters. These sessions featured interpreter input, patient interactions, and feedback from interpreters and clinical preceptors. A survey assessed the perspectives of students, preceptors, and interpreters. RESULTS: Outcomes revealed positive reception of interpreter and LEP patient integration. Students gained confidence in working with interpreters and valued interpreter feedback. Preceptors recognized the sessions' value in preparing students for future clinical interactions. CONCLUSIONS: This study underscores the importance of involving experienced interpreters in training students for real-world interactions with LEP patients. Early interpreter training enhances students' communication skills and ability to serve linguistically diverse populations. Further exploration could expand languages and interpretation modes and assess long-term effects on students' clinical performance. By effectively training future healthcare professionals to navigate language barriers and cultural diversity, this research contributes to equitable patient care in diverse communities.


Subject(s)
Physician Assistants , Students, Medical , Humans , United States , Cross-Cultural Comparison , Translating , Communication , Communication Barriers , Physician-Patient Relations
5.
BMC Health Serv Res ; 24(1): 204, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355492

ABSTRACT

BACKGROUND: We identified that Stanford Health Care had a significant number of patients who after discharge are found by the utilization review committee not to meet Center for Mediare and Medicaid Services (CMS) 2-midnight benchmark for inpatient status. Some of the charges incurred during the care of these patients are written-off and known as Medicare 1-day write-offs. This study which aims to evaluate the use of a Best Practice Alert (BPA) feature on the electronic medical record, EPIC, to ensure appropriate designation of a patient's hospitalization status as either inpatient or outpatient in accordance with Center for Medicare and Medicaid services (CMS) 2 midnight length of stay benchmark thereby reducing the number of associated write-offs. METHOD: We incorporated a best practice alert (BPA) into the Epic Electronic Medical Record (EMR) that would prompt the discharging provider and the case manager to review the patients' inpatient designation prior to discharge and change the patient's designation to observation when deemed appropriate. Patients who met the inclusion criteria (Patients must have Medicare fee-for-service insurance, inpatient length of stay (LOS) less than 2 midnights, inpatient designation as hospitalization status at time of discharge, was hospitalized to an acute level of care and belonged to one of 37 listed hospital services at the time of signing of the discharge order) were randomized to have the BPA either silent or active over a three-month period from July 18, 2019, to October 18, 2019. RESULT: A total of 88 patients were included in this study: 40 in the control arm and 48 in the intervention arm. In the intervention arm, 8 (8/48, 16.7%) had an inpatient status designation despite potentially meeting Medicare guidelines for an observation stay, comparing to 23 patients (23/40, 57.5%) patients in the control group (p = 0.001). The estimated number of write-offs in the control arm was 17 (73.9%, out of 23 inpatient patients) while in the intervention arm was 1 (12.5%, out of 8 inpatient patient) after accounting for patients who may have met inpatient criteria for other reasons based on case manager note review. CONCLUSION: This is the first time to our knowledge that a BPA has been used in this manner to reduce the number of Medicare 1-day write-offs.


Subject(s)
Medicare , Quality Improvement , Aged , Humans , United States , Hospitalization , Length of Stay , Patient Discharge
6.
Res Sq ; 2023 Oct 30.
Article in English | MEDLINE | ID: mdl-37961377

ABSTRACT

Sifting through vast textual data and summarizing key information from electronic health records (EHR) imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown immense promise in natural language processing (NLP) tasks, their efficacy on a diverse range of clinical summarization tasks has not yet been rigorously demonstrated. In this work, we apply domain adaptation methods to eight LLMs, spanning six datasets and four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Our thorough quantitative assessment reveals trade-offs between models and adaptation methods in addition to instances where recent advances in LLMs may not improve results. Further, in a clinical reader study with ten physicians, we show that summaries from our best-adapted LLMs are preferable to human summaries in terms of completeness and correctness. Our ensuing qualitative analysis highlights challenges faced by both LLMs and human experts. Lastly, we correlate traditional quantitative NLP metrics with reader study scores to enhance our understanding of how these metrics align with physician preferences. Our research marks the first evidence of LLMs outperforming human experts in clinical text summarization across multiple tasks. This implies that integrating LLMs into clinical workflows could alleviate documentation burden, empowering clinicians to focus more on personalized patient care and the inherently human aspects of medicine.

7.
JAMA Intern Med ; 183(9): 1028-1030, 2023 09 01.
Article in English | MEDLINE | ID: mdl-37459090

ABSTRACT

This study compares performance on free-response clinical reasoning examinations of first- and second-year medical students vs 2 models of a popular chatbot.


Subject(s)
Students, Medical , Humans , Educational Measurement/methods , Physical Examination , Software , Clinical Reasoning
8.
medRxiv ; 2023 Mar 29.
Article in English | MEDLINE | ID: mdl-37034742

ABSTRACT

Importance: Studies show that ChatGPT, a general purpose large language model chatbot, could pass the multiple-choice US Medical Licensing Exams, but the model's performance on open-ended clinical reasoning is unknown. Objective: To determine if ChatGPT is capable of consistently meeting the passing threshold on free-response, case-based clinical reasoning assessments. Design: Fourteen multi-part cases were selected from clinical reasoning exams administered to pre-clerkship medical students between 2019 and 2022. For each case, the questions were run through ChatGPT twice and responses were recorded. Two clinician educators independently graded each run according to a standardized grading rubric. To further assess the degree of variation in ChatGPT's performance, we repeated the analysis on a single high-complexity case 20 times. Setting: A single US medical school. Participants: ChatGPT. Main Outcomes and Measures: Passing rate of ChatGPT's scored responses and the range in model performance across multiple run throughs of a single case. Results: 12 out of the 28 ChatGPT exam responses achieved a passing score (43%) with a mean score of 69% (95% CI: 65% to 73%) compared to the established passing threshold of 70%. When given the same case 20 separate times, ChatGPT's performance on that case varied with scores ranging from 56% to 81%. Conclusions and Relevance: ChatGPT's ability to achieve a passing performance in nearly half of the cases analyzed demonstrates the need to revise clinical reasoning assessments and incorporate artificial intelligence (AI)-related topics into medical curricula and practice.

9.
J Obes Metab Syndr ; 31(3): 277-281, 2022 Sep 30.
Article in English | MEDLINE | ID: mdl-36058896

ABSTRACT

Background: The mechanism for possible association between obesity and poor clinical outcomes from Coronavirus Disease 2019 (COVID-19) remains unclear. Methods: We analyzed 22,915 adult COVID-19 patients hospitalized from March 2020 to April 2021 to non-intensive care using the American Heart Association National COVID Registry. A multivariable Poisson model adjusted for age, sex, medical history, admission respiratory status, hospitalization characteristics, and laboratory findings was used to calculate length of stay (LOS) as a function of body mass index (BMI). We similarly analyzed 5,327 patients admitted to intensive care for comparison. Results: Relative to normal BMI subjects, overweight, class I obese, and class II obese patients had approximately half-day reductions in LOS (-0.469 days, P<0.01; -0.480 days, P<0.01; -0.578 days, P<0.01, respectively). Conclusion: The model identified a dose-dependent, inverse relationship between BMI category and LOS for COVID-19, which was not seen when the model was applied to critically ill patients.

11.
Vox Sang ; 117(1): 87-93, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34081800

ABSTRACT

BACKGROUND AND OBJECTIVES: Inappropriate platelet transfusions represent an opportunity for improvements in patient care. Use of a best practice alert (BPA) as clinical decision support (CDS) for red cell transfusions has successfully reduced unnecessary red blood cell (RBC) transfusions in prior studies. We studied the impact of a platelet transfusion BPA with visibility randomized by patient chart. MATERIALS AND METHODS: A BPA was built to introduce CDS at the time of platelet ordering in the electronic health record. Alert visibility was randomized at the patient encounter level. BPA eligible platelet transfusions for patients with both visible and non-visible alerts were recorded along with reasons given for override of the BPA. Focused interviews were performed with providers who interacted with the BPA to assess its impact on their decision making. RESULTS: Over a 9-month study period, 446 patient charts were randomized. The visible alert group used 25.3% fewer BPA eligible platelets. Mean monthly usage of platelets eligible for BPA display was 65.7 for the control group and 49.1 for the visible alert group (p = 0.07). BPA-eligible platelets used per inpatient day at risk per month were not significantly different between groups (2.4 vs. 2.1, p = 0.53). CONCLUSION: It is feasible to study CDS via chart-based randomization. A platelet BPA reduced total platelets used over the study period and may have resulted in $151,069 in yearly savings, although there were no differences when adjusted for inpatient days at risk. During interviews, providers offered additional workflow insights allowing further improvement of CDS for platelet transfusions.


Subject(s)
Decision Support Systems, Clinical , Platelet Transfusion , Blood Platelets , Electronic Health Records , Erythrocyte Transfusion , Humans
13.
Patterns (N Y) ; 2(3): 100213, 2021 Mar 12.
Article in English | MEDLINE | ID: mdl-33748796

ABSTRACT

Jupyter Notebooks have transformed the communication of data analysis pipelines by facilitating a modular structure that brings together code, markdown text, and interactive visualizations. Here, we extended Jupyter Notebooks to broaden their accessibility with Appyters. Appyters turn Jupyter Notebooks into fully functional standalone web-based bioinformatics applications. Appyters present to users an entry form enabling them to upload their data and set various parameters for a multitude of data analysis workflows. Once the form is filled, the Appyter executes the corresponding notebook in the cloud, producing the output without requiring the user to interact directly with the code. Appyters were used to create many bioinformatics web-based reusable workflows, including applications to build customized machine learning pipelines, analyze omics data, and produce publishable figures. These Appyters are served in the Appyters Catalog at https://appyters.maayanlab.cloud. In summary, Appyters enable the rapid development of interactive web-based bioinformatics applications.

14.
J Grad Med Educ ; 13(1): 76-82, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33680304

ABSTRACT

BACKGROUND: There is insufficient knowledge about how personal access to handheld ultrasound devices (HUDs) improves trainee learning with point-of-care ultrasound (POCUS). OBJECTIVE: To assess whether HUDs, alongside a yearlong lecture series, improved trainee POCUS usage and ability to acquire images. METHODS: Internal medicine intern physicians (n = 47) at a single institution from 2017 to 2018 were randomized 1:1 to receive personal HUDs (n = 24) for patient care/self-directed learning vs no-HUDs (n = 23). All interns received a repeated lecture series on cardiac, thoracic, and abdominal POCUS. Main outcome measures included self-reported HUD usage rates and post-intervention assessment scores using the Rapid Assessment of Competency in Echocardiography (RACE) scale between HUD and no-HUD groups. RESULTS: HUD interns reported performing POCUS assessments on patients a mean 6.8 (SD 2.2) times per week vs 6.4 (SD 2.9) times per week in non-HUD arm (P = .66). There was no relationship between the number of self-reported examinations per week and a trainee's post-intervention RACE score (rho = 0.022, P = .95). HUD interns did not have significantly higher post-intervention RACE scores (median HUD score 17.0 vs no-HUD score 17.8; P = .72). Trainee confidence with cardiac POCUS did not correlate with RACE scores. CONCLUSIONS: Personal HUDs without direct supervision did not increase the amount of POCUS usage or improve interns' acquisition abilities. Interns who reported performing more examinations per week did not have higher RACE scores. Improved HUD access and lectures without additional feedback may not improve POCUS mastery.


Subject(s)
Internship and Residency , Clinical Competence , Humans , Internal Medicine/education , Point-of-Care Systems , Ultrasonography
15.
FASEB Bioadv ; 3(2): 110-117, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33615156

ABSTRACT

The COVID-19 pandemic forced medical schools to rapidly transform their curricula using online learning approaches. At our institution, the preclinical Practice of Medicine (POM) course was transitioned to large-group, synchronous, video-conference sessions. The aim of this study is to assess whether there were differences in learner engagement, as evidenced by student question-asking behaviors between in-person and videoconferenced sessions in one preclinical medical student course. In Spring, 2020, large-group didactic sessions in POM were converted to video-conference sessions. During these sessions, student microphones were muted, and video capabilities were turned off. Students submitted typed questions via a Q&A box, which was monitored by a senior student teaching assistant. We compared student question asking behavior in recorded video-conference course sessions from POM in Spring, 2020 to matched, recorded, in-person sessions from the same course in Spring, 2019. We found that, on average, the instructors answered a greater number of student questions and spent a greater percentage of time on Q&A in the online sessions compared with the in-person sessions. We also found that students asked a greater number of higher complexity questions in the online version of the course compared with the in-person course. The video-conference learning environment can promote higher student engagement when compared with the in-person learning environment, as measured by student question-asking behavior. Developing an understanding of the specific elements of the online learning environment that foster student engagement has important implications for instructional design in both the online and in-person setting.

16.
Genet Med ; 23(2): 259-271, 2021 02.
Article in English | MEDLINE | ID: mdl-33093671

ABSTRACT

PURPOSE: The NIH Undiagnosed Diseases Network (UDN) evaluates participants with disorders that have defied diagnosis, applying personalized clinical and genomic evaluations and innovative research. The clinical sites of the UDN are essential to advancing the UDN mission; this study assesses their contributions relative to standard clinical practices. METHODS: We analyzed retrospective data from four UDN clinical sites, from July 2015 to September 2019, for diagnoses, new disease gene discoveries and the underlying investigative methods. RESULTS: Of 791 evaluated individuals, 231 received 240 diagnoses and 17 new disease-gene associations were recognized. Straightforward diagnoses on UDN exome and genome sequencing occurred in 35% (84/240). We considered these tractable in standard clinical practice, although genome sequencing is not yet widely available clinically. The majority (156/240, 65%) required additional UDN-driven investigations, including 90 diagnoses that occurred after prior nondiagnostic exome sequencing and 45 diagnoses (19%) that were nongenetic. The UDN-driven investigations included complementary/supplementary phenotyping, innovative analyses of genomic variants, and collaborative science for functional assays and animal modeling. CONCLUSION: Investigations driven by the clinical sites identified diagnostic and research paradigms that surpass standard diagnostic processes. The new diagnoses, disease gene discoveries, and delineation of novel disorders represent a model for genomic medicine and science.


Subject(s)
Undiagnosed Diseases , Animals , Genomics , Humans , Rare Diseases/diagnosis , Rare Diseases/genetics , Retrospective Studies , Exome Sequencing
17.
Postgrad Med J ; 97(1144): 97-102, 2021 Feb.
Article in English | MEDLINE | ID: mdl-32051280

ABSTRACT

BACKGROUND: Repetitive laboratory testing in stable patients is low-value care. Electronic health record (EHR)-based interventions are easy to disseminate but can be restrictive. OBJECTIVE: To evaluate the effect of a minimally restrictive EHR-based intervention on utilisation. SETTING: One year before and after intervention at a 600-bed tertiary care hospital. 18 000 patients admitted to General Medicine, General Surgery and the Intensive Care Unit (ICU). INTERVENTION: Providers were required to specify the number of times each test should occur instead of being able to order them indefinitely. MEASUREMENTS: For eight tests, utilisation (number of labs performed per patient day) and number of associated orders were measured. RESULTS: Utilisation decreased for some tests on all services. Notably, complete blood count with differential decreased 9% (p<0.001) on General Medicine and 21% (p<0.001) in the ICU. CONCLUSIONS: Requiring providers to specify the number of occurrences of labs changes significantly reduces utilisation in some cases.


Subject(s)
Diagnostic Tests, Routine/statistics & numerical data , Electronic Health Records , Practice Patterns, Physicians'/statistics & numerical data , Unnecessary Procedures/statistics & numerical data , Utilization Review , Female , Humans , Male , Middle Aged , Retreatment/statistics & numerical data , Retrospective Studies
18.
J Am Med Inform Assoc ; 27(12): 1850-1859, 2020 12 09.
Article in English | MEDLINE | ID: mdl-33106874

ABSTRACT

OBJECTIVE: To assess usability and usefulness of a machine learning-based order recommender system applied to simulated clinical cases. MATERIALS AND METHODS: 43 physicians entered orders for 5 simulated clinical cases using a clinical order entry interface with or without access to a previously developed automated order recommender system. Cases were randomly allocated to the recommender system in a 3:2 ratio. A panel of clinicians scored whether the orders placed were clinically appropriate. Our primary outcome included the difference in clinical appropriateness scores. Secondary outcomes included total number of orders, case time, and survey responses. RESULTS: Clinical appropriateness scores per order were comparable for cases randomized to the order recommender system (mean difference -0.11 order per score, 95% CI: [-0.41, 0.20]). Physicians using the recommender placed more orders (median 16 vs 15 orders, incidence rate ratio 1.09, 95%CI: [1.01-1.17]). Case times were comparable with the recommender system. Order suggestions generated from the recommender system were more likely to match physician needs than standard manual search options. Physicians used recommender suggestions in 98% of available cases. Approximately 95% of participants agreed the system would be useful for their workflows. DISCUSSION: User testing with a simulated electronic medical record interface can assess the value of machine learning and clinical decision support tools for clinician usability and acceptance before live deployments. CONCLUSIONS: Clinicians can use and accept machine learned clinical order recommendations integrated into an electronic order entry interface in a simulated setting. The clinical appropriateness of orders entered was comparable even when supported by automated recommendations.


Subject(s)
Decision Support Systems, Clinical , Electronic Health Records , Medical Order Entry Systems , User-Computer Interface , Humans , Information Storage and Retrieval/methods , Machine Learning
19.
Cardiovasc Diagn Ther ; 10(4): 1048-1067, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32968660

ABSTRACT

Carotid artery plaque is a measure of atherosclerosis and is associated with future risk of atherosclerotic cardiovascular disease (ASCVD), which encompasses coronary, cerebrovascular, and peripheral arterial diseases. With advanced imaging techniques, computerized tomography (CT) and magnetic resonance imaging (MRI) have shown their potential superiority to routine ultrasound to detect features of carotid plaque vulnerability, such as intraplaque hemorrhage (IPH), lipid-rich necrotic core (LRNC), fibrous cap (FC), and calcification. The correlation between imaging features and histological changes of carotid plaques has been investigated. Imaging of carotid features has been used to predict the risk of cardiovascular events. Other techniques such as nuclear imaging and intra-vascular ultrasound (IVUS) have also been proposed to better understand the vulnerable carotid plaque features. In this article, we review the studies of imaging specific carotid plaque components and their correlation with risk scores.

20.
Patterns (N Y) ; 1(6): 100090, 2020 Sep 11.
Article in English | MEDLINE | ID: mdl-32838343

ABSTRACT

In a short period, many research publications that report sets of experimentally validated drugs as potential COVID-19 therapies have emerged. To organize this accumulating knowledge, we developed the COVID-19 Drug and Gene Set Library (https://amp.pharm.mssm.edu/covid19/), a collection of drug and gene sets related to COVID-19 research from multiple sources. The platform enables users to view, download, analyze, visualize, and contribute drug and gene sets related to COVID-19 research. To evaluate the content of the library, we compared the results from six in vitro drug screens for COVID-19 repurposing candidates. Surprisingly, we observe low overlap across screens while highlighting overlapping candidates that should receive more attention as potential therapeutics for COVID-19. Overall, the COVID-19 Drug and Gene Set Library can be used to identify community consensus, make researchers and clinicians aware of new potential therapies, enable machine-learning applications, and facilitate the research community to work together toward a cure.

SELECTION OF CITATIONS
SEARCH DETAIL
...